Learning values across many orders of magnitude

نویسندگان

  • Hado P. van Hasselt
  • Arthur Guez
  • Matteo Hessel
  • Volodymyr Mnih
  • David Silver
چکیده

Most learning algorithms are not invariant to the scale of the signal that is being approximated. We propose to adaptively normalize the targets used in the learning updates. This is important in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games, where the rewards were clipped to a predetermined range. This clipping facilitates learning across many different games with a single learning algorithm, but a clipped reward function can result in qualitatively different behavior. Using adaptive normalization we can remove this domain-specific heuristic without diminishing overall performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning functions across many orders of magnitudes

Most learning algorithms are not invariant to the scale of the function that is being approximated. We propose to adaptively normalize the targets used in learning. This is useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari ga...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Packing density of rigid aggregates is independent of scale.

Large planetary seedlings, comets, microscale pharmaceuticals, and nanoscale soot particles are made from rigid, aggregated subunits that are compacted under low compression into larger structures spanning over 10 orders of magnitude in dimensional space. Here, we demonstrate that the packing density (θf) of compacted rigid aggregates is independent of spatial scale for systems under weak compa...

متن کامل

Learning to schedule new orders in batch plants using aproximate dynamic programming

Production scheduling in a wide range of batch plants involves minimizing tardiness of batches already scheduled when inserting new orders. This problem is addressed here as learning an “order insertion policy” using intensive simulations in the framework of approximate dynamic programming (ADP). Simple insertion operators are defined and the values of choosing them at different schedule states...

متن کامل

An Efficient Method for Bayesian Network Parameter Learning from Incomplete Data

We propose an efficient method for estimating the parameters of a Bayesian network, from incomplete datasets, i.e., datasets containing variables with missing values. In contrast to textbook approaches such as EM and the gradient method, our approach is non-iterative, yields closed form parameter estimates, and eliminates the need for inference in a Bayesian network. Our approach is capable of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016